Adaptive Exploration Using Stochastic Neurons
نویسندگان
چکیده
Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state, exploratory data can be considered as being global. Results suggest that the presented approach can be efficiently combined with standard offand on-policy algorithms such as Q-learning and Sarsa.
منابع مشابه
Design space exploration of stochastic system-of-systems simulations using adaptive sequential experiments
متن کامل
Market Adaptive Control Function Optimization in Continuous Cover Forest Management
Economically optimal management of a continuous cover forest is considered here. Initially, there is a large number of trees of different sizes and the forest may contain several species. We want to optimize the harvest decisions over time, using continuous cover forestry, which is denoted by CCF. We maximize our objective function, the expected present value, with consideration of stochastic p...
متن کاملAdaptive Fractional-order Control for Synchronization of Two Coupled Neurons in the External Electrical Stimulation
This paper addresses synchronizing two coupled chaotic FitzHugh–Nagumo (FHN) neurons with weakly gap junction under external electrical stimulation (EES). To transmit information among coupled neurons, by generalization of the integer-order FHN equations of the coupled system into the fractional-order in frequency domain using Crone approach, the behavior of each coupled neuron relies on its pa...
متن کاملTarget Detection in Bistatic Passive Radars by Using Adaptive Processing Based on Correntropy Cost Function
In this paper a novel method is introduced for target detection in bistatic passive radars which uses the concept of correntropy to distinguish correct targets from false detections. In proposed method the history of each cell of ambiguity function is modeled as a stochastic process. Then the stochastic processes consist the noise are differentiated from those consisting targets by constructing...
متن کاملInformation Compexity in Bandit Subset Selection
We consider the problem of efficiently exploring the arms of a stochastic bandit to identify the best subset of a specified size. Under the PAC and the fixed-budget formulations, we derive improved bounds by using KL-divergence-based confidence intervals. Whereas the application of a similar idea in the regret setting has yielded bounds in terms of the KL-divergence between the arms, our bounds...
متن کامل